Los datos estĆ”n relacionados con campaƱas de marketing directo (llamadas telefónicas) de una institución bancaria portuguesa. El objetivo de la clasificación es predecir si el cliente suscribirĆ” un depósito a plazo (variable y). Los datos estĆ”n relacionados con campaƱas de marketing directo de una institución bancaria portuguesa. A menudo, se requerĆa mĆ”s de un contacto con el mismo cliente, para poder acceder si el producto (depósito bancario a plazo) estarĆa ('sĆ') o no ('no') suscrito. Por tanto, el objetivo de la clasificación es predecir si el cliente suscribirĆ” (sĆ/no) un depósito a plazo (variable y).
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
sns.set_theme(color_codes=True)
import warnings
warnings.filterwarnings("ignore")
df = pd.read_csv("bank-additional-full.csv", delimiter=";")
pd.set_option("display.max_columns", None)
df.head()
| age | job | marital | education | default | housing | loan | contact | month | day_of_week | duration | campaign | pdays | previous | poutcome | emp.var.rate | cons.price.idx | cons.conf.idx | euribor3m | nr.employed | y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 56 | housemaid | married | basic.4y | no | no | no | telephone | may | mon | 261 | 1 | 999 | 0 | nonexistent | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 | no |
| 1 | 57 | services | married | high.school | unknown | no | no | telephone | may | mon | 149 | 1 | 999 | 0 | nonexistent | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 | no |
| 2 | 37 | services | married | high.school | no | yes | no | telephone | may | mon | 226 | 1 | 999 | 0 | nonexistent | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 | no |
| 3 | 40 | admin. | married | basic.6y | no | no | no | telephone | may | mon | 151 | 1 | 999 | 0 | nonexistent | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 | no |
| 4 | 56 | services | married | high.school | no | no | yes | telephone | may | mon | 307 | 1 | 999 | 0 | nonexistent | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 | no |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 41188 entries, 0 to 41187 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 age 41188 non-null int64 1 job 41188 non-null object 2 marital 41188 non-null object 3 education 41188 non-null object 4 default 41188 non-null object 5 housing 41188 non-null object 6 loan 41188 non-null object 7 contact 41188 non-null object 8 month 41188 non-null object 9 day_of_week 41188 non-null object 10 duration 41188 non-null int64 11 campaign 41188 non-null int64 12 pdays 41188 non-null int64 13 previous 41188 non-null int64 14 poutcome 41188 non-null object 15 emp.var.rate 41188 non-null float64 16 cons.price.idx 41188 non-null float64 17 cons.conf.idx 41188 non-null float64 18 euribor3m 41188 non-null float64 19 nr.employed 41188 non-null float64 20 y 41188 non-null object dtypes: float64(5), int64(5), object(11) memory usage: 6.6+ MB
df.isnull().sum()
age 0 job 0 marital 0 education 0 default 0 housing 0 loan 0 contact 0 month 0 day_of_week 0 duration 0 campaign 0 pdays 0 previous 0 poutcome 0 emp.var.rate 0 cons.price.idx 0 cons.conf.idx 0 euribor3m 0 nr.employed 0 y 0 dtype: int64
#Seleccionar datos categoricos
df_categoricos = df[["job", "marital", "education", "default", "housing", "loan", "contact", "month", "day_of_week",
"poutcome", "y"]]
df_categoricos.head()
| job | marital | education | default | housing | loan | contact | month | day_of_week | poutcome | y | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | housemaid | married | basic.4y | no | no | no | telephone | may | mon | nonexistent | no |
| 1 | services | married | high.school | unknown | no | no | telephone | may | mon | nonexistent | no |
| 2 | services | married | high.school | no | yes | no | telephone | may | mon | nonexistent | no |
| 3 | admin. | married | basic.6y | no | no | no | telephone | may | mon | nonexistent | no |
| 4 | services | married | high.school | no | no | yes | telephone | may | mon | nonexistent | no |
#Seleccionar datos nĆŗmericos
df_numericos = df[["age", "duration", "campaign", "pdays", "previous", "emp.var.rate", "cons.price.idx",
"cons.conf.idx", "euribor3m", "nr.employed"]]
df_numericos.head()
| age | duration | campaign | pdays | previous | emp.var.rate | cons.price.idx | cons.conf.idx | euribor3m | nr.employed | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 56 | 261 | 1 | 999 | 0 | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 |
| 1 | 57 | 149 | 1 | 999 | 0 | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 |
| 2 | 37 | 226 | 1 | 999 | 0 | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 |
| 3 | 40 | 151 | 1 | 999 | 0 | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 |
| 4 | 56 | 307 | 1 | 999 | 0 | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 |
# Documentar sns.countplot: https://seaborn.pydata.org/generated/seaborn.countplot.html
cat_vars = ["job", "marital", "education", "default", "housing", "loan", "contact", "month", "day_of_week", "poutcome"]
#Crear figuras con subplots
fig, axs = plt.subplots(nrows=2, ncols=5, figsize = (20, 10))
axs = axs.flatten()
#Crear un countplot para cada variable categorica
for i, var in enumerate (cat_vars):
sns.countplot(x=var, hue="y", data = df_categoricos, ax=axs[i])
axs[i].set_xticklabels(axs[i].get_xticklabels(), rotation = 90)
#Ajustar espacio entre subplots
fig.tight_layout()
#Mostrar el plot
plt.show()
# Documentar seaborn.histplot: https://seaborn.pydata.org/generated/seaborn.histplot.html
#Lista de variables categoricas
cat_vars = ["job", "marital", "education", "default", "housing", "loan", "contact", "month", "day_of_week", "poutcome"]
#Crear figuras con subplots
fig, axs = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
axs = axs.flatten()
#Crear histogramas para cada variable categorica
for i, var in enumerate (cat_vars):
sns.histplot(x=var, hue="y", data = df_categoricos, ax=axs[i], multiple = "fill", kde = False, element = "bars", fill= True, stat = "density")
axs[i].set_xticklabels(df_categoricos[var].unique(), rotation=90)
axs[i].set_xlabel(var)
#Ajustar el especio entre subplots
fig.tight_layout()
#Mostrar el plot
plt.show()
La mayorĆa de las personas que suscriben los depósitos bancarios a plazo son: jubilados y estudiantes.
La mayorĆa de las personas que suscriben los depósitos bancarios a plazo son cantactados por vĆa celular.
La mayorĆa de las personas que suscriben los depósitos bancarios a plazo tienen su Ćŗltimo contacto en: octubre, diciembre, marzo, septiembre
La mayorĆa de las personas que suscriben el depósito bancario a plazo han valorado exitosamente la campaƱa de marketing.
# Documentar sns.boxplot: https://seaborn.pydata.org/generated/seaborn.boxplot.html
num_vars = ["age", "duration", "campaign", "pdays", "previous", "emp.var.rate", "cons.price.idx",
"cons.conf.idx", "euribor3m", "nr.employed"]
fig, axs = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
axs = axs.flatten()
for i, var in enumerate(num_vars):
sns.boxplot(x=var, data=df, ax=axs[i])
fig.tight_layout()
plt.show()
# Documentar sns.violinplot: https://seaborn.pydata.org/generated/seaborn.violinplot.html
num_vars = ["age", "duration", "campaign", "pdays", "previous", "emp.var.rate", "cons.price.idx",
"cons.conf.idx", "euribor3m", "nr.employed"]
fig, axs = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
axs = axs.flatten()
for i, var in enumerate(num_vars):
sns.violinplot(x=var, data=df, ax=axs[i])
fig.tight_layout()
plt.show()
num_vars = ["age", "duration", "campaign", "pdays", "previous", "emp.var.rate", "cons.price.idx",
"cons.conf.idx", "euribor3m", "nr.employed"]
fig, axs = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
axs = axs.flatten()
for i, var in enumerate(num_vars):
sns.violinplot(x=var, y="y", data=df, ax=axs[i])
fig.tight_layout()
plt.show()
num_vars = ["age", "duration", "campaign", "pdays", "previous", "emp.var.rate", "cons.price.idx",
"cons.conf.idx", "euribor3m", "nr.employed"]
fig, axs = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
axs = axs.flatten()
for i, var in enumerate(num_vars):
sns.histplot(x=var, data=df, ax=axs[i])
fig.tight_layout()
plt.show()
num_vars = ['age', 'duration', 'campaign', 'pdays', 'previous', 'emp.var.rate', 'cons.price.idx',
'cons.conf.idx', 'euribor3m', 'nr.employed']
fig, axs = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
axs = axs.flatten()
for i, var in enumerate(num_vars):
sns.histplot(x=var, hue='y', data=df, ax=axs[i], multiple="stack")
fig.tight_layout()
plt.show()
# Documentar sns.pairplot: https://seaborn.pydata.org/generated/seaborn.pairplot.html
# Listado de variables nĆŗmericas.
num_vars = ['age', 'duration', 'campaign', 'pdays', 'previous', 'emp.var.rate', 'cons.price.idx',
'cons.conf.idx', 'euribor3m', 'nr.employed']
# Crear una Matriz para los diagramas de dispersión
sns.pairplot(df, hue='y')
<seaborn.axisgrid.PairGrid at 0x136b31e06a0>
Devuelve valores Ćŗnicos de una serie de objetos.
Los valor Ćŗnicos se devuelven en orden de aparición. Los valores Ćnicos se basan en tablas hash, por lo tanto, NO se ordenan.
https://pandas.pydata.org/docs/reference/api/pandas.Series.unique.html
df['job'].unique()
array(['housemaid', 'services', 'admin.', 'blue-collar', 'technician',
'retired', 'management', 'unemployed', 'self-employed', 'unknown',
'entrepreneur', 'student'], dtype=object)
df['marital'].unique()
array(['married', 'single', 'divorced', 'unknown'], dtype=object)
df['education'].unique()
array(['basic.4y', 'high.school', 'basic.6y', 'basic.9y',
'professional.course', 'unknown', 'university.degree',
'illiterate'], dtype=object)
df['default'].unique()
array(['no', 'unknown', 'yes'], dtype=object)
df['housing'].unique()
array(['no', 'yes', 'unknown'], dtype=object)
df['loan'].unique()
array(['no', 'yes', 'unknown'], dtype=object)
df['contact'].unique()
array(['telephone', 'cellular'], dtype=object)
df['month'].unique()
array(['may', 'jun', 'jul', 'aug', 'oct', 'nov', 'dec', 'mar', 'apr',
'sep'], dtype=object)
df['day_of_week'].unique()
array(['mon', 'tue', 'wed', 'thu', 'fri'], dtype=object)
df['poutcome'].unique()
array(['nonexistent', 'failure', 'success'], dtype=object)
df['y'].unique()
array(['no', 'yes'], dtype=object)
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
df['job']= label_encoder.fit_transform(df['job'])
df['job'].unique()
array([ 3, 7, 0, 1, 9, 5, 4, 10, 6, 11, 2, 8])
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
df['marital']= label_encoder.fit_transform(df['marital'])
df['marital'].unique()
array([1, 2, 0, 3])
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
df['education']= label_encoder.fit_transform(df['education'])
df['education'].unique()
array([0, 3, 1, 2, 5, 7, 6, 4])
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
df['default']= label_encoder.fit_transform(df['default'])
df['default'].unique()
array([0, 1, 2])
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
df['housing']= label_encoder.fit_transform(df['housing'])
df['housing'].unique()
array([0, 2, 1])
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
df['loan']= label_encoder.fit_transform(df['loan'])
df['loan'].unique()
array([0, 2, 1])
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
df['contact']= label_encoder.fit_transform(df['contact'])
df['contact'].unique()
array([1, 0])
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
df['month']= label_encoder.fit_transform(df['month'])
df['month'].unique()
array([6, 4, 3, 1, 8, 7, 2, 5, 0, 9])
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
df['day_of_week']= label_encoder.fit_transform(df['day_of_week'])
df['day_of_week'].unique()
array([1, 3, 4, 2, 0])
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
df['poutcome']= label_encoder.fit_transform(df['poutcome'])
df['poutcome'].unique()
array([1, 0, 2])
from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
df['y']= label_encoder.fit_transform(df['y'])
df['y'].unique()
array([0, 1])
df.head()
| age | job | marital | education | default | housing | loan | contact | month | day_of_week | duration | campaign | pdays | previous | poutcome | emp.var.rate | cons.price.idx | cons.conf.idx | euribor3m | nr.employed | y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 56 | 3 | 1 | 0 | 0 | 0 | 0 | 1 | 6 | 1 | 261 | 1 | 999 | 0 | 1 | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 | 0 |
| 1 | 57 | 7 | 1 | 3 | 1 | 0 | 0 | 1 | 6 | 1 | 149 | 1 | 999 | 0 | 1 | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 | 0 |
| 2 | 37 | 7 | 1 | 3 | 0 | 2 | 0 | 1 | 6 | 1 | 226 | 1 | 999 | 0 | 1 | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 | 0 |
| 3 | 40 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 6 | 1 | 151 | 1 | 999 | 0 | 1 | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 | 0 |
| 4 | 56 | 7 | 1 | 3 | 0 | 0 | 2 | 1 | 6 | 1 | 307 | 1 | 999 | 0 | 1 | 1.1 | 93.994 | -36.4 | 4.857 | 5191.0 | 0 |
"Y" Label
sns.countplot(df['y'])
df['y'].value_counts()
0 36548 1 4640 Name: y, dtype: int64
from sklearn.utils import resample
#Crear dos diferentes dataframe de una clase mayoritaria y minoritaria
df_majority = df[(df['y']==0)]
df_minority = df[(df['y']==1)]
# muestreo ascendente de la clase minoritaria
df_minority_upsampled = resample(df_minority,
replace=True, # muesta con reemplazo
n_samples= 36548, # para que coincida con la clase mayoritaria
random_state=0) # resultados reproducible
# Combinar la clase mayoritaria con la muestra ascendente de la clase minoritaria
df_upsampled = pd.concat([df_minority_upsampled, df_majority])
sns.countplot(df_upsampled['y'])
df_upsampled['y'].value_counts()
1 36548 0 36548 Name: y, dtype: int64
Detectar outlier es tedioso, especialmente cuando se tienen multiples tipos de datos.
Por lo tanto, tenemos diferentes formas de detectar valores atĆpicos para diferentes tipos de datos.
En cuanto a los datos distribuidos normalmente, podemos obtener el mƩtodo Z-Score;
Para skewed data, se usa IQR.
def remove_outliers_iqr(df, columns):
for col in columns:
q1 = df[col].quantile(0.25)
q3 = df[col].quantile(0.75)
iqr = q3 - q1
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr
df = df[(df[col] >= lower_bound) & (df[col] <= upper_bound)]
return df
# SeƱale las columnas para remover los outliers
columns_to_check = ['age', 'duration', 'campaign', 'pdays', 'previous', 'emp.var.rate', 'cons.price.idx', 'cons.conf.idx',
'euribor3m', 'nr.employed']
# Solicitar la función que remueve los outliers usando IQR
df_clean = remove_outliers_iqr(df_upsampled, columns_to_check)
# Mostrar el resultado en el dataframe
df_clean.head()
| age | job | marital | education | default | housing | loan | contact | month | day_of_week | duration | campaign | pdays | previous | poutcome | emp.var.rate | cons.price.idx | cons.conf.idx | euribor3m | nr.employed | y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 37017 | 25 | 8 | 2 | 7 | 1 | 2 | 0 | 0 | 3 | 3 | 371 | 1 | 999 | 0 | 1 | -2.9 | 92.469 | -33.6 | 1.044 | 5076.2 | 1 |
| 36682 | 51 | 9 | 2 | 6 | 0 | 0 | 0 | 0 | 4 | 0 | 657 | 1 | 999 | 0 | 1 | -2.9 | 92.963 | -40.8 | 1.268 | 5076.2 | 1 |
| 29384 | 45 | 7 | 2 | 7 | 0 | 0 | 0 | 1 | 0 | 0 | 541 | 1 | 999 | 0 | 1 | -1.8 | 93.075 | -47.1 | 1.405 | 5099.1 | 1 |
| 21998 | 29 | 9 | 2 | 3 | 1 | 0 | 0 | 0 | 1 | 4 | 921 | 3 | 999 | 0 | 1 | 1.4 | 93.444 | -36.1 | 4.964 | 5228.1 | 1 |
| 16451 | 37 | 10 | 2 | 2 | 1 | 2 | 2 | 0 | 3 | 4 | 633 | 1 | 999 | 0 | 1 | 1.4 | 93.918 | -42.7 | 4.963 | 5228.1 | 1 |
df_clean.shape
(49702, 21)
Seaborn es una biblioteca de python que permite hacer mejores grÔficos fÔcilmente gracias a su función heatmap(). Un mapa de calor es una representación grÔfica de datos donde cada valor de una matriz se representa como un color.
plt.figure(figsize=(20, 16))
sns.heatmap(df_clean.corr(), fmt='.2g', annot=True)
<AxesSubplot:>
X = df_clean.drop('y', axis=1)
y = df_clean['y']
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
Para ser precisos, el mĆ©todo split() genera los Ćndices de entrenamiento y prueba, no los datos en si mismos.
Tener mĆŗltiples divisiones puede ser Ćŗtil si desea estimar mejor el rendimiento de su modelo.
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3,random_state=0)
from sklearn.svm import SVC
svc = SVC("c=1", "gamma=1")
svc.fit(X_train, y_train)
SVC()
y_pred = svc.predict(X_test)
print('Precisión en el set de Entrenamiento: {:.2f}'
.format(svc.score(X_train, y_train)))
print('Precisión en el set de Test: {:.2f}'
.format(svc.score(X_test, y_test)))
Precisión en el set de Entrenamiento: 0.83 Precisión en el set de Test: 0.83
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, jaccard_score
print('F-1 Score : ',(f1_score(y_test, y_pred, average='micro')))
print('Precision Score : ',(precision_score(y_test, y_pred, average='micro')))
print('Recall Score : ',(recall_score(y_test, y_pred, average='micro')))
print('Jaccard Score : ',(jaccard_score(y_test, y_pred, average='micro')))
F-1 Score : 0.8321373482663805 Precision Score : 0.8321373482663805 Recall Score : 0.8321373482663805 Jaccard Score : 0.7125301481566556
from sklearn.metrics import classification_report, confusion_matrix, roc_curve
print (classification_report(y_test, y_pred))
precision recall f1-score support
0 0.85 0.87 0.86 8875
1 0.80 0.78 0.79 6036
accuracy 0.83 14911
macro avg 0.83 0.82 0.83 14911
weighted avg 0.83 0.83 0.83 14911
# Matriz de confusión
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
cm_matrix = pd.DataFrame(data=cm, columns=['Actual Positive:1', 'Actual Negative:0'],
index=['Predict Positive:1', 'Predict Negative:0'])
sns.heatmap(cm_matrix, annot=True, fmt='d', cmap='Reds')
plt.figure(figsize=(9,9))
<Figure size 900x900 with 0 Axes>
<Figure size 900x900 with 0 Axes>
from sklearn.model_selection import GridSearchCV
param_grid = {"C":[0.1, 1, 10, 100, 1000], "gamma":[1, 0.1, 0.01, 0.001, 0.0001]}
grid = GridSearchCV(SVC(), param_grid, verbose=2)
grid.fit(X_train, y_train)
Fitting 5 folds for each of 25 candidates, totalling 125 fits [CV] END .....................................C=0.1, gamma=1; total time= 5.6min [CV] END .....................................C=0.1, gamma=1; total time= 5.8min [CV] END .....................................C=0.1, gamma=1; total time= 5.8min [CV] END .....................................C=0.1, gamma=1; total time= 5.7min [CV] END .....................................C=0.1, gamma=1; total time= 5.8min [CV] END ...................................C=0.1, gamma=0.1; total time= 4.7min [CV] END ...................................C=0.1, gamma=0.1; total time= 5.2min [CV] END ...................................C=0.1, gamma=0.1; total time= 5.3min [CV] END ...................................C=0.1, gamma=0.1; total time= 5.2min [CV] END ...................................C=0.1, gamma=0.1; total time= 5.3min [CV] END ..................................C=0.1, gamma=0.01; total time= 1.8min [CV] END ..................................C=0.1, gamma=0.01; total time= 1.8min [CV] END ..................................C=0.1, gamma=0.01; total time= 1.8min [CV] END ..................................C=0.1, gamma=0.01; total time= 1.8min [CV] END ..................................C=0.1, gamma=0.01; total time= 1.8min [CV] END .................................C=0.1, gamma=0.001; total time= 47.7s [CV] END .................................C=0.1, gamma=0.001; total time= 48.5s [CV] END .................................C=0.1, gamma=0.001; total time= 48.2s [CV] END .................................C=0.1, gamma=0.001; total time= 47.8s [CV] END .................................C=0.1, gamma=0.001; total time= 47.6s [CV] END ................................C=0.1, gamma=0.0001; total time= 46.5s [CV] END ................................C=0.1, gamma=0.0001; total time= 43.9s [CV] END ................................C=0.1, gamma=0.0001; total time= 42.1s [CV] END ................................C=0.1, gamma=0.0001; total time= 43.5s [CV] END ................................C=0.1, gamma=0.0001; total time= 45.9s [CV] END .......................................C=1, gamma=1; total time=30.5min [CV] END .......................................C=1, gamma=1; total time= 5.6min [CV] END .......................................C=1, gamma=1; total time= 6.3min [CV] END .......................................C=1, gamma=1; total time= 6.6min [CV] END .......................................C=1, gamma=1; total time= 7.1min [CV] END .....................................C=1, gamma=0.1; total time= 5.4min [CV] END .....................................C=1, gamma=0.1; total time= 5.6min [CV] END .....................................C=1, gamma=0.1; total time= 3.6min [CV] END .....................................C=1, gamma=0.1; total time= 3.2min [CV] END .....................................C=1, gamma=0.1; total time= 3.3min [CV] END ....................................C=1, gamma=0.01; total time= 1.3min [CV] END ....................................C=1, gamma=0.01; total time= 1.3min [CV] END ....................................C=1, gamma=0.01; total time= 1.3min [CV] END ....................................C=1, gamma=0.01; total time= 1.3min [CV] END ....................................C=1, gamma=0.01; total time= 1.2min [CV] END ...................................C=1, gamma=0.001; total time= 38.2s [CV] END ...................................C=1, gamma=0.001; total time= 33.8s [CV] END ...................................C=1, gamma=0.001; total time= 32.7s [CV] END ...................................C=1, gamma=0.001; total time= 34.5s [CV] END ...................................C=1, gamma=0.001; total time= 37.8s [CV] END ..................................C=1, gamma=0.0001; total time= 28.3s [CV] END ..................................C=1, gamma=0.0001; total time= 28.1s [CV] END ..................................C=1, gamma=0.0001; total time= 28.3s [CV] END ..................................C=1, gamma=0.0001; total time= 28.4s [CV] END ..................................C=1, gamma=0.0001; total time= 29.8s [CV] END ......................................C=10, gamma=1; total time= 4.5min [CV] END ......................................C=10, gamma=1; total time= 3.9min [CV] END ......................................C=10, gamma=1; total time= 3.8min [CV] END ......................................C=10, gamma=1; total time= 4.4min [CV] END ......................................C=10, gamma=1; total time= 4.3min [CV] END ....................................C=10, gamma=0.1; total time= 3.1min [CV] END ....................................C=10, gamma=0.1; total time= 3.2min [CV] END ....................................C=10, gamma=0.1; total time= 4.9min [CV] END ....................................C=10, gamma=0.1; total time= 6.7min [CV] END ....................................C=10, gamma=0.1; total time=18.5min [CV] END ...................................C=10, gamma=0.01; total time= 4.1min [CV] END ...................................C=10, gamma=0.01; total time= 4.2min [CV] END ...................................C=10, gamma=0.01; total time= 4.1min [CV] END ...................................C=10, gamma=0.01; total time= 4.1min [CV] END ...................................C=10, gamma=0.01; total time= 4.2min [CV] END ..................................C=10, gamma=0.001; total time= 6.2min [CV] END ..................................C=10, gamma=0.001; total time= 1.5min [CV] END ..................................C=10, gamma=0.001; total time= 1.0min [CV] END ..................................C=10, gamma=0.001; total time= 1.1min [CV] END ..................................C=10, gamma=0.001; total time= 1.0min [CV] END .................................C=10, gamma=0.0001; total time= 50.6s [CV] END .................................C=10, gamma=0.0001; total time= 50.4s [CV] END .................................C=10, gamma=0.0001; total time= 49.4s [CV] END .................................C=10, gamma=0.0001; total time= 49.9s [CV] END .................................C=10, gamma=0.0001; total time= 50.8s [CV] END .....................................C=100, gamma=1; total time= 6.9min [CV] END .....................................C=100, gamma=1; total time= 6.1min [CV] END .....................................C=100, gamma=1; total time= 5.9min [CV] END .....................................C=100, gamma=1; total time= 6.6min [CV] END .....................................C=100, gamma=1; total time= 6.7min [CV] END ...................................C=100, gamma=0.1; total time= 4.9min [CV] END ...................................C=100, gamma=0.1; total time= 4.9min [CV] END ...................................C=100, gamma=0.1; total time= 4.9min [CV] END ...................................C=100, gamma=0.1; total time= 4.9min [CV] END ...................................C=100, gamma=0.1; total time= 4.9min [CV] END ..................................C=100, gamma=0.01; total time= 3.1min [CV] END ..................................C=100, gamma=0.01; total time= 3.1min [CV] END ..................................C=100, gamma=0.01; total time= 3.1min [CV] END ..................................C=100, gamma=0.01; total time= 3.1min [CV] END ..................................C=100, gamma=0.01; total time= 3.1min [CV] END .................................C=100, gamma=0.001; total time= 2.1min [CV] END .................................C=100, gamma=0.001; total time= 1.7min [CV] END .................................C=100, gamma=0.001; total time= 1.8min [CV] END .................................C=100, gamma=0.001; total time= 1.8min [CV] END .................................C=100, gamma=0.001; total time= 1.9min [CV] END ................................C=100, gamma=0.0001; total time= 1.4min [CV] END ................................C=100, gamma=0.0001; total time= 1.5min [CV] END ................................C=100, gamma=0.0001; total time= 1.4min [CV] END ................................C=100, gamma=0.0001; total time= 1.3min [CV] END ................................C=100, gamma=0.0001; total time= 1.3min [CV] END ....................................C=1000, gamma=1; total time= 6.9min [CV] END ....................................C=1000, gamma=1; total time= 6.1min [CV] END ....................................C=1000, gamma=1; total time= 5.9min [CV] END ....................................C=1000, gamma=1; total time= 8.3min [CV] END ....................................C=1000, gamma=1; total time= 6.8min [CV] END ..................................C=1000, gamma=0.1; total time= 4.9min [CV] END ..................................C=1000, gamma=0.1; total time= 4.9min [CV] END ..................................C=1000, gamma=0.1; total time= 4.9min [CV] END ..................................C=1000, gamma=0.1; total time= 5.0min [CV] END ..................................C=1000, gamma=0.1; total time= 5.0min [CV] END .................................C=1000, gamma=0.01; total time= 3.3min [CV] END .................................C=1000, gamma=0.01; total time= 3.8min [CV] END .................................C=1000, gamma=0.01; total time= 3.4min [CV] END .................................C=1000, gamma=0.01; total time= 3.1min [CV] END .................................C=1000, gamma=0.01; total time= 3.3min [CV] END ................................C=1000, gamma=0.001; total time= 4.2min [CV] END ................................C=1000, gamma=0.001; total time= 3.9min [CV] END ................................C=1000, gamma=0.001; total time= 4.0min [CV] END ................................C=1000, gamma=0.001; total time= 4.3min [CV] END ................................C=1000, gamma=0.001; total time= 4.3min [CV] END ...............................C=1000, gamma=0.0001; total time= 3.9min [CV] END ...............................C=1000, gamma=0.0001; total time= 3.9min [CV] END ...............................C=1000, gamma=0.0001; total time= 3.8min [CV] END ...............................C=1000, gamma=0.0001; total time= 3.9min [CV] END ...............................C=1000, gamma=0.0001; total time= 4.2min
GridSearchCV(estimator=SVC(),
param_grid={'C': [0.1, 1, 10, 100, 1000],
'gamma': [1, 0.1, 0.01, 0.001, 0.0001]},
verbose=2)
grid.best_params_
{'C': 1, 'gamma': 1}
linear_classifier = SVC(kernel='linear').fit(X_train,y_train)
y_pred = linear_classifier.predict(X_test)
print('Model accuracy with linear kernel : {0:0.3f}'. format(accuracy_score(y_test, y_pred)))
Model accuracy with linear kernel : 0.826
# Matriz de confusión para SVM Linear Kernel
cm = confusion_matrix(y_test, y_pred)
cm_matrix = pd.DataFrame(data=cm, columns=['Actual Positive:1', 'Actual Negative:0'],
index=['Predict Positive:1', 'Predict Negative:0'])
sns.heatmap(cm_matrix, annot=True, fmt='d', cmap='mako')
<AxesSubplot:>
print(classification_report(y_test,y_pred))
precision recall f1-score support
0 0.93 0.77 0.84 8875
1 0.73 0.91 0.81 6036
accuracy 0.83 14911
macro avg 0.83 0.84 0.82 14911
weighted avg 0.85 0.83 0.83 14911
# Modelo SVM usando la función kernel Gaussian RBF
rbf_svc=SVC(kernel='rbf').fit(X_train,y_train)
y_pred = rbf_svc.predict(X_test)
print('Model accuracy with rbf kernel : {0:0.3f}'. format(accuracy_score(y_test, y_pred)))
Model accuracy with rbf kernel : 0.832
# Matriz de confusión Gaussian RBF
cm = confusion_matrix(y_test, y_pred)
cm_matrix = pd.DataFrame(data=cm, columns=['Actual Positive:1', 'Actual Negative:0'],
index=['Predict Positive:1', 'Predict Negative:0'])
sns.heatmap(cm_matrix, annot=True, fmt='d', cmap='mako')
<AxesSubplot:>
# Modelo SVM usando la función kernel Polynomial
Poly_svc=SVC(kernel='poly', C=1).fit(X_train,y_train)
y_pred = Poly_svc.predict(X_test)
print('Model accuracy with poly kernel : {0:0.3f}'. format(accuracy_score(y_test, y_pred)))
Model accuracy with poly kernel : 0.836
# Matriz de confusión Polynomial
cm = confusion_matrix(y_test, y_pred)
cm_matrix = pd.DataFrame(data=cm, columns=['Actual Positive:1', 'Actual Negative:0'],
index=['Predict Positive:1', 'Predict Negative:0'])
sns.heatmap(cm_matrix, annot=True, fmt='d', cmap='mako')
<AxesSubplot:>
# Modelo SVM usando la función kernel Sigmoid
Sig_svc=SVC(kernel='sigmoid', C=1).fit(X_train,y_train)
y_pred = Sig_svc.predict(X_test)
print('Model accuracy with sigmoid kernel : {0:0.3f}'. format(accuracy_score(y_test, y_pred)))
Model accuracy with sigmoid kernel : 0.705
# Matriz de confusión Sigmoid
cm = confusion_matrix(y_test, y_pred)
cm_matrix = pd.DataFrame(data=cm, columns=['Actual Positive:1', 'Actual Negative:0'],
index=['Predict Positive:1', 'Predict Negative:0'])
sns.heatmap(cm_matrix, annot=True, fmt='d', cmap='mako')
<AxesSubplot:>